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Abstract 

We study the correlation of the occurrence of coronary heart disease (CHD) with the presence of 
the single-nucleotide polymorphism (SNP) at the -308 position of the tumor necrosis factor alpha 
(TNF-a) gene. We also consider the influence of the occurrence of type 2 diabetes (t2DM). Using 
Bayesian inference, we first pursue a bottom-up approach to compute the working hypothesis and 
the probabilities derivable from the data. We then pursue a top-down approach by modelling 
the signal pathway that causally connects the SNP with the emergence of CHD. We compute the 
functional form of the probability of CHD conditional on the presence of the SNP in terms of both 
the statistical and biochemical properties of the system. From the probability of occurrence of 
a disease conditional on a given risk factor, we explore the possibility of extracting information 
on the pathways involved in the occurrence of the disease. This is a first study that we want 
to systematise into a comprehensive formalism to be applied to the inference of the mechanism 
connecting the risk factors to the disease. 
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I. INTRODUCTION 



We are interested in the association of diseases, in particular of coronary heart diseases 
(CHD), with genetic factors in order to determine underlying genetically-driven functional 
mechanisms that are causally related to the disease. In this context, environmental factors 
are regarded as contaminants. Among the risk factors for the emergence of CHD, genetic 
determinants may provide a wealth of information on the nature of the diseases, which can 
be used to develop new diagnosis and treatment methods. The study of these factors has 
attracted the effort of many research teams for the identification of disease susceptibility 
genes as well as acquired somatic mutations. Among these genes is that of the tumor 
necrosis factor alpha (TNF-a), a pleiotropic cytokine produced mainly by macrophages 
and T-cells which is involved in the inflammatory response of the immune system [1J. It 
has been suggested that the TNF-a gene affects the modulation of the lipid metabolism, 
obesity susceptibility and insulin resistance [2HI], thus being potentially implicated in the 
development of cardiovascular diseases (see Ref. [5] and references therein). However, the 
results on its association with CHD are contradictory [BHS] in part because based on the 
frequentist analysis [9] . In order to infer the risk of CHD derived from potential risk factors 
it is important to develop a formalism that extracts all possible information from the data 
and combines them with other data sets on different intervening factors. Here we introduce 
a possible formalism based on Bayesian inference and test its applicability on the data set 
from Ref. [2]. Extending the application of this formalism to the integration of other data 
sets will be explored in a subsequent study. 

In this manuscript we attempt to quantify the risk of occurrence of CHD based on its as- 
sociation with a single-nucleotide polymorphism (SNP) in the TNF-a promoter. This entails 
the calculation of a probability distribution for the occurrence of CHD conditional on SNP's 
or other factors. The influence of other factors is here illustrated by the occurrence of type 
2 diabetes (t2DM). The causal relation between the risk factors and the occurrence of the 
disease is a function of the rates which characterize the implicated pathways. Here, knowing 
how the occurrence of the disease is distributed over the parameter space of the risk factors 
and knowing how the risk factors act at the biochemical level, we show how we can extract 
information on the pathway involved in the emergence of the disease. However, since the 
pathways involved are most likely interconnected with others, sourced by different factors, 
the next step would be to allow for the participation of various factors in the emergence 
of the same disease. There is a plethora of sparse phenomeno logical/symptomatic data on 
the simultaneous occurrence of SNP's and diseases from which correlations are tentatively 
drawn. No formalism, however, has yet been proposed that integrates the various data sets 
for a consistent inference of the correlations. Here we suggest one such formalism and apply 
it to a three- variable data set. The formalism developed here can be extended to other 
factors in the effort to systematise the sparse data to identify risk factors, to combine them 
into a comprehensive model for the mechanism that leads to the disease and therefrom to 
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CHD 



CHD 



t2DM t2DM t2DM t2DM 
SNP 43 67 26 48 

SJH 5 63 168 109 159 

Table I: The data. Frequencies of the TNFa-308 SNP in CHD patients, t2DM patients and 
controls. 

infer a universal law of gene mutation. 

When, instead of computing the probability distribution of some quantity produced by 
the process, we compute the conditional probability of an unsolved variable in the process 
given the observed variables, we are solving an inverse probability problem. This requires the 
use of the Bayes theorem. The Bayesian approach has been used extensively for parameter 
inference and model selection from cosmology [T0l - [1"2] to biology [T3HT5] among many others. 
In this case we observe the occurrence of a given disease and the correlation with a SNP. 
However, we do not have a theory of how to go to the SNP from the disease. Here we 
relate the SNP with the disease via a model of the potentially implicated pathways. The 
parameters of this model are the relevant factors that we want to infer from the data. Such 
a theory would be important as a first step to predict the occurrence of a genetically-driven 
disease for a given polymorphism as well as to understand the mechanism of genetic mutation 
from which polymorphisms derive. In this study we propose a solution to the first problem 
and will approach the second problem in a forthcoming study. 

The manuscript is organized as follows. In section [IT] we select the working hypothesis for 
the relation between the SNP and CHD on the basis of the Bayes factors and compute the 
probabilities on the presence of the SNP derivable from the data. One of these will be used 



as the likelihood for the occurrence of CHD. In section |III| we suggest a simplistic model 
for the signalling pathway between the onset of the SNP and the emergence of CHD, and 
compute the posterior probability for the occurrence of CHD. Finally we comment on the 
results and indicate the research routes that we will be exploring next. 



II. BOTTOM-UP APPROACH 

We are interested in studying the correlation between polymorphims and CHD based on 
available data. We will base our analysis on the data reported in Ref. [2] which consist of 
frequencies of occurrence of CHD as a function of the SNP at the position —308 of the TNF-a 
promoter. It is also advanced a correlation between the SNP and an increased predisposition 
to CHD in type 2 diabetic patients. That study also comprises an analysis of the gender 
dependence on the diabetes. 

The sample of CHD patients consists of Nchd = 341 patients randomly selected. Out of 
these, N C HD,t2DM = 106 also suffered from t2DM. Another sample of type 2 diabetic patients 
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numbering N^jj^ t2DM = 135 was selected among non-CHD patients. These two samples, 
together with a control sample of Nqj^^dm = 207 non-CHD, non-diabetic patients, were 
analysed for the occurrence of the —308 TNF-a SNP. Thus the total number of diabetic 
patients consists of a random and a non-random component on the factor CHD. Since we 
are interested in studying the correlation between the SNP and CHD, we cannot use the 
data on the sample of non-CHD diabetic patients to extract information on the frequency of 
occurrence of CHD given that diabetes had occurred. We can, however, derive information 
on the frequency of diabetes given that the SNP or CHD occurred. The data are summarized 
in Table [H 



A. Model comparison: the Bayesian evidence 

Given the data, we can derive the influence of CHD and t2DM on the SNP. We have three 
variables, namely occurrence of CHD (CHD), occurrence of t2DM (t2DM) and presence 
of the SNP (SNP), and six hypotheses for the presence of the SNP. The hypotheses are 
the following: Hqq\ the probability of the SNP does not depend on the occurrence of either 
CHD or t2DM; H i. the probability of the SNP depends on the occurrence of CHD; H w : 
the probability of the SNP depends on the occurrence of t2DM; H\\\ the probability of the 
SNP depends on the independent occurrence of both CHD and t2DM; H^ d : the probability 
of the SNP depends on the occurrence of CHD and on the occurrence of t2DM given that 
CHD is present; Hl\ dn : the probability of the SNP depends on the occurrence of t2DM and 
on the occurrence of CHD given that diabetes is present. These are schematically depicted 
in Fig. [T] 

We note that, given the selection criterion for the population of N-^j^ tiDMt we cannot use 
the corresponding data to infer on the occurrence of CHD given the occurrence of t2DM since 
they would bias the results. For this reason, H\l dm is excluded as a viable hypothesis given 
the data collected. We proceed to compare the remaining hypotheses based on the Bayesian 
evidence. The probability of an hypothesis given the data is the posterior probability of the 
corresponding model [TB] 

P(Hm = (1) 

where P(D\Hj) is the evidence, P(Hi) is the prior probability of Hi and P(D) = 
^2 i P(D\H i )P(H i ). In order to infer which hypothesis is more likely in view of the data, 
we compare the evidence computed for the alternative hypotheses. The evidence is the 
integral of the likelihood over the parameter space 9 of the model 

P(D\H l ) = J d6P(D\6 A H l )P(9\H l ). (2) 
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Figure 1: Diagram of the hypotheses considered. The circles denote the corresponding 
variables. The arrows denote a correlation between the encircled variables which they connect. A 
straight arrow indicates that the probability of the variable at the tip of the arrow is conditional 
on the variable at the origin of the arrow. A curved arrow indicates that the probability of the 
variable at the tip of the arrow is conditional on the variable at the inflection of the arrow, which 
moreover is conditional on the variable at the origin of the arrow. In the upper set we have Hqq, 
Hqi and H\o; in the lower set we have H\\, H^ d and H{\ dm . 



Assuming equal prior probabilities for the different hypotheses, then 

pjHi\D) = nprn ^ 

P{H S \D) P{D\H 3 Y 1 > 

We compute the evidence for the five hypotheses described above (for details see Appendix 
[A! ). In order to compare the hypotheses, we take the logarithm of the ratio of the corre- 
sponding evidences, P(Hi\D) / P(Hj\D) = B^, which we present in Table [H] This quantity 
is known as the Bayes factor and gives empirical levels of significance for the strength of 
the evidence. It also encapsulates the Occam's factor which measures the adequacy of the 
hypothesis to the data over the parameter space of the hypothesis [16J. The levels of sig- 
nificance ascribed to the Bayes factor are calibrated by the Jeffrey's scale [TT] as follows: if 
1.0 < Bij < 3.2, Hi should not be favoured over Hj] if 3.2 < B^ < 10, there is substantial 
evidence for Hi over Hj] if 10 < B^ < 100 there is strong evidence, while for B^ > 100 
the evidence for Hij should be considered decisive. In the first column we find the Bayes 
factors which relate each hypothesis with H 00 . Since hypothesis H Q0 describes the data as 
the result of a random process, this column measures the preference for a departure from 
randomness [18]. From these values we infer that all hypotheses are substantially favoured 
over Hqo. The values seem to suggest that H^ d is also favoured over the other hypotheses, 
however they are not sufficient to infer substantial evidence. Since H^ d is the hypothesis 
that exhibits the most substantial evidence over the null hypothesis, we will take H^ d as 
our working hypothesis upon which we will base our subsequent inferences. 
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4.08 
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0.93 
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#11 


7.07 


1.77 


1.62 


1.73 






Table II: The Bayes factors for the hypotheses considered. Here B roWjCO i = H row / H co i. 



B. Model fitting 



Having inferred from the computation of the evidence which of the possible hypotheses 
is most likely to be compatible with the data on the presence of the SNP, we proceed to 
compute the probability for the occurrence of the polymorphism. Let H denote our working 
hypothesis. Then the probability that the SNP is present is 

P(SNP\H) = f d6 P(SNP\6)P(6\D A H) = f d6 P(SNP\6) * H ^ P ^ H ^ , (4) 

J J VI/ 

Since H^ d consists of a two-component hypothesis [19], each of which described by two 
parameters, the resulting parameter space is four-dimensional. The equation above must be 
generalized for a multidimensional parameter space where each factor is no longer a scalar 
but instead a (4x4) matrix. Since each component consists of two disjoint sets, the matrix 
is diagonal, each component being weighted by the relative size of the population. We then 
write 

P(SNP\H) = J d 2 p P(SNP\p)P(p\D AH) + J d 2 p P(SNP\p)P(p\D A H). (5) 

Here the indices range over the two-dimensional parameter spaces, with p = {poi,pu) and 
P — (Poi)Pii)) where for simplicity we have dropped the tilde from the notation used in 
Appendix [Aj In particular, p m is the frequency of SNP given the occurrence of CHD and 
Pqi the frequency of SNP given non-occurrence of CHD, both subject to non-occurrence 
of t2DM, whereas pn is the frequency of SNP given that t2DM has occurred and p\\ the 
frequency of SNP given that t2DM has not occurred, both subject to CHD having occurred. 
Let P(SNP\p) = p and P(SNP\p) = p. The posterior probability of p is by the Bayes 
theorem 

PMHAm P(D\P^H)P(p\H) P(D\pAH) 

PW H) = PjD\H) = P(D\H) (6) 
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and similarly for the posterior probability of p. In the last step we assume for simplicity a 
uniform prior for both p and p. 1 Writing the evidence as 



P(D\H)= d 2 p P(D\pAH)P(p\H)+ d 2 p P(D\p A H)P(p\H) 



(7) 



we find for hypothesis H^ d that 



P(SNP\H) 
1 

P{D\H) 



dpoi / dpu / dp i / dpn 



x 



. CHD,t2DN \ SNP,CHD,t2DN 
71 AT }P0l 

iv SNP,CHD,t2DN 
Nr 



+1 



\ pQ^^SNP,CHD,t2DN 



+ (1 _ 7 ) ( ^CHD,t2DN \NsNP,CHD, t 2DN + l^ _ pii) NsWP,CHD,t2 D N 

. NsNP,CHD,t2DN J 



Nr 



- JV SNP,CHD,t2DN/ 



N SNP,CHD,t2DN + 1 ^ 1 _ ^_ S N mF)VWB : 



which yields 



+ (1"7) 



P(SNP\H) 



N 



' 'i'J-'-r2i>\ \ p N SNP,CHD,t2DN + 1 , 1 _ p^Ngjfp^gjj^jf 



NsNP,CHD,t2DN/ 



(8) 



1 



P(D\H) 



x 



7 



A, 



SNP,CHD,t2DN + 1 



T CHD.t2DN + 2 )( N CHD,t2DN + 1 ) 

^ NsNP,CHD,t2DN + 1 

(NcHD,t2DM + 2)(NcHD,t2DM + 1) 
^SNP,ClTD,t2DN + 1 



+7 



CHD,t2DN 



+ 2)(iV; 



CHD,t2DN 



+ (1"7) 



A, 



SNP,CHD,t2DN 



+ 1) 
+ 1 



CHD.t2DM 



+ 2)(A ( 



CHD.t2DM 



+ 1 



(9) 



Here 7 = N CHDmM /N CHD and 7 = A ; 



CHD,t2DM 



/AW, with P(£>|if) given by Eqn. (A9). 



Substituting the values from Table |T| we find that P(SNP\H) = 0.30 ± 0.001 which we 
identify as the effective mutation rate A e // of the Poisson probability distribution describing 
the occurrence of the SNP. This value is different from the naive guess A = N$np/N = 0.27 or 
from the more elaborate one arising from the assumption of the null hypothesis [see Appendix 
Bjfor the derivation]. It then follows that the posterior probability for the occurrence of n 
mutations in a population of size A is 

(Ae//AO n 



P(n|A e// AA) = exp[-A e// A]- 



n! 



(10) 



1 This choice of prior is justified by the absence of an a priori bias on the values of these parameters. 
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Similarly we compute 
P(SNP\CHD A H) 



d 2 p P(SNP\CHD A p)P(p\D A H) = / d 2 p 



r2 P P(D\pAH) 



P(D\H) 



P(D\H) 
+ (1-7) 



N 



7 



SNP,CHD,t2DN 



1 



CHD,t2DN ,t2DN 
NsNP,CHD,t2DN + 1 



+ 1 



{NcHD,t2DM + 2)(NcHD,t2DM + 1 

and find that P(SNP\CHD A H) = 0.20 ± 0.001. We can also compute 



P(t2DN\CHD A H) 



dp P(t2DM\CHD A p)P(p\D A if) 

1 NcHD,t2DN + 1 

P(D|tf) (N CHD + 2)(N CHD + 1) 



(11) 



(12) 



finding that P(t2DN\CHD A i?) = 0.09 ± 0.001. The errors indicated were computed from 
error propagation, assuming the error of a counting result n to be 1/y/n. 



III. TOP-DOWN APPROACH 



We now proceed to estimate the influence of the SNP on the occurrence of CHD. We 
want to find the posterior probability of the occurrence of CHD given the presence of the 
SNP, i.e. 

P( CH D[ SNP A H) = W^ p y gJ ' g ' . (13) 

Here P(CHD\H) is the prior probability of CHD and P(SNP\CHD A H) is the likelihood 
of CHD for a fixed SNP. The remaining term P(SNP\H) has no CHD dependence and 
can thus be absorbed into the normalization constant. It is known as the evidence or the 
marginal likelihood. 



A. A simplistic model for the signalling pathway 

Since the working hypothesis relates the presence of the SNP with both the occurrence 
of CHD and the occurrence of t2DM, we infer that the changes from the canonical pathway 
introduced by the SNP will have repercussions on the signalling cascades which regulate the 
emergence of CHD and t2DM. If we assume that the SNP will only affect one source signal, 
then this correlation suggests that the resulting signal transduction pathways interfere with 
one another. Functional interference can be derived from a common source signal or from 
common components downstream [20J. As far as the source signal is concerned, we allow 
for two possibilities: 1) the pathways have different source signals; 2) the pathways have the 
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Figure 2: Diagram of the pathways with different source signals. The first column depicts 
the original architecture corresponding to the canonical pathway (i.e., in the absence of the SNP). 
The second column depicts one possible alteration of the corresponding pathway due to the SNP. 
Here the SNP acts on the signal xq. The continuous lines represent the functioning pathways, and 
the dashed lines represent pathways which can suffer delay or cease to function entirely (in the case 
of previously existing pathways), or which could simply not come to exist (in the case of newly 
generated pathways), (a) Case la). For yo = this reduces to Case 2. (b) Case lb). 



same source signal and sufficiently downstream diverge. In case 1) and in order to reproduce 
interference between the two signalling pathways we can still have two further sub-cases: la) 
the pathways share components and the effect of the SNP consists of either an alteration in 
the velocity of the affected signal or an alteration of one of the pathways; lb) the pathways 
do not share components but the pathway altered by the SNP shares components with the 
unaltered one. These two subcases can be distinguished by the correlation between the two 
diseases in the absence of the SNP, with case la) describing the existence of an a priori 
correlation between the occurrence of the two diseases and case lb) the absence of such 
correlation. In case 2) the interference is built-in so the effect of the SNP is similar to that 
in case la). The diagrams are depicted in Fig. [2] which we proceed now to describe. 

The variables x and y describe black boxes along the pathways that regulate the emergence 
of CHD and t2DM respectively. By black boxes we mean unresolved chemical reactions 
where no intervening elements are specified other than the input and the output reaction 
rates between two sequential black boxes. The index of the variables denotes the relative 
position in the pathway of the corresponding black box component. Thus '0' indicates the 
upstream component fed by the initial signal and which is subject to alteration upon the 
action of the SNP, whereas '2' indicates the final component which determines the emergence 
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of the disease, with '1' denoting the intermediary component where the interference of the 
altered pathway with the unaltered one is manifested. Here the SNP affects the signal xo 
and propagates downstream through an altered or newly created pathway '6'. A variable 
which refers to an altered pathway is denoted by x Q , whereas a variable which refers to a 
newly created pathway is denoted by x^. The same rule holds for the corresponding rates. 
The cases where the original pathway ceases to function can be interpreted as inhibition. 
This a simplistic description of the potential signalling pathways involved, which is but a 
caricature of the real biochemical system. Nonetheless it is the possible description based 
on the data, which moreover captures the functional correlations inferred. 

To select the working model we use as criterion the possibility of the emergence of a 
connection between the SNP and each disease without the participation of the other. We 
conceive three possible ways for the SNP to function: (1) the signal does not suffer any 
alteration from that in the absence of the SNP; (2) the signal is triggered at a smaller rate 
so that x < x ; (3) no signal is triggered. We can exclude (1) on the basis that it contradicts 
the working hypothesis. Moreover, since (3) can be described as a limiting case of (2) when 
x = 0, we proceed to solely analyse (2) with each suggested model according to the disease 
combinations which can be reproduced. 

Case la) If both X2 and 2/2 require that x\ be acted by the products of both xq and 
yo, then the temporal properties of the two pathways should be very close. In the case 
of a slower reaction rate, if the signalling pathways are assumed to consist of an isolated 
system then both diseases would occur. The way to prevent it would be by capturing the 
required reagents from neighbouring pathways. This scenario, however, would be beyond 
our current capabilities of inference and constraint. If instead X\ has the flexibility of being 
independently activated by the two signals and of also independently acting upon x<i and 
i/2, then the emergence of the diseases will depend on the supply of each pathway that 
develops from x\ downstream. Should a single triggering signal be enough to supply for both 
downstream pathways, it could be the case that no disease occurs. This would moreover 
depend on the phase difference between the generation of x\ upon the activation of xq and 
Do. Should one triggering signal not be enough, then either disease could occur. This could 
be prevented if a compensation mechanism were triggered so that, in the absence of an 
effective signalling from one source, the working source would be stimulated according to 
the deficiency. 

Case lb) In this and yi are independently sourced and independently develop 

their pathways downstream. In the case of a slower reaction rate, no interference would be 
generated as long as the SNP-affected pathway could still run. However, in the case of a 
more drastic reduction of the trigger of x\, an alternative pathway would be intercepted and 
the required reagents deviated. This could lead to a case of competition. Should pathway 
x be thus maintained, pathway y could either collapse or continue, resulting in the disease 
implicated in the pathway y emerging or not. If pathway x cannot be maintained and the 
corresponding disease is not avoidable, then we can still have either emergence or not of the 
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other disease, depending on the degree of reagent deviation. 

This empirical analysis can be complemented by the following quantitative one. We will 
assume that the system under study can be described as a dynamical one. In the absence of 
the SNP, the dynamical system is described by the rates indicated in Fig. [2j first column. If 
the SNP has occurred in the coding of the input signal in xq, then the system will instead be 
sourced by the altered xq and described by the rates as indicated in Fig. [2j second column. 

We compute the fixed points (identified by the superscript '*') of the quantities involved 
in both the canonical pathway and the SNP-altered one. The fixed points describe the state 
of the system in dynamical equilibrium and are computed by setting the time derivates equal 
to zero. Since we are interested in the probabilities which describe the average states of the 
system and not in the dynamical evolution that leads to those states, the fixed points are 
the variables to be used. 

For case la) 



dx i 
~dT 

d,X2 

~dt 
dm 
dt 



a® + a<$0(x o )\ y - (pQ + oij» + #>) *i 

eh 



(14) 
(15) 
(16) 



where the 5's denote decay rates. Here /3 is the fraction of the output of yo which activates 
x\ and thereby attempts to compensate for the deficiency of activation of X\ derived from 
the SNP 
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Q[x - Al hres - x 



(17) 



This term is reminiscent of the carrying capacity term which sets an upper limit to cellular 
growth [2T]. Here we set a lower limit, as determined by AQ hres , to the deficiency in xq caused 
by the altered signal for the onset, as imposed by the step function, of the compensating 
mechanism. This quantity would measure a functional marker for the presence of the SNP. 
We compare the fixed points in both the canonical pathway and the SNP-altered one, finding 
that 
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This shows that, in the absence of the compensating term in yo, a difference in xq will be 
reflected in a difference in both x* 2 and y\ , and thus imply the occurrence respectively of CHD 
and diabetes. Thus the occurrence of each disease will be related to both the onset of the 
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altered signal and to the change from the canonical pathway of its propagation downstream. 
Expressing x and y 2 as functions of x 2 , we find that 



x = -T- (A 2 x* 2 - A) 



V2 



B2 



(20) 
(21) 



where the As and S's are functions of the biochemical rates, the canonical signal xq and 
the fixed point of the canonical pathway x 2 . 
For case lb) 
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dt 
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(22) 
(23) 
(24) 
(25) 



The fixed points for the canonical and altered pathways are related as follows 
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(26) 
(27) 



In this case, the change from the sharing of a component while compensating for the supply 
of one pathway could cause the other to be depleted of essential reagents. The outcome, 
however, will depend on how much the intercepting pathway takes and how much the inter- 
cepted pathway can run canonically without. Similarly we find expressions for x and y 2 as 
functions of x% 



Xq 



V2 = 



1 

A~ 
1 

B~2 



(A 2 x* 2 - A) 
(A 2 x* 2 -A)^ + B 



(28) 
(29) 



where similarly the As and _B's are functions of the biochemical rates, the canonical signal 
xq and the fixed points of the canonical pathways x 2 and y 2 . 

In the following subsection we will use as the working description of the signalling pathway 
the dynamical system of case la). Given the similarity in the functional form of the depen- 
dence of the variables, similar conclusions would also be inferred, with the interpretation 
only differing on the basis of the different structure of the pathways. 
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B. Translation of the dynamical system into a probability description 



In order to compute P(CHD\SNPAH), we proceed to write the probabilities in Eqn. (13) 
in terms of the variables in the description of the dynamical system. In particular, we want 
to computeS 

PH*„) = P{x i x ^ (x2) (30) 

where the probability P(x 2 ) is related to that for the occurrence of CHD and the likelihood 
P(xq\x2) is related to that for the data on the SNP conditional on the prior for CHD. The 
quantity P(xq) is the evidence, which is found by marginalising the likelihood and is related 
to the probability for occurrence of the SNP. We will use the probabilities computed in the 
previous section to constrain the priors assumed here. We will then derive an expression 
for P(CHD\SNP A H) in terms of both the statistical properties of the priors and the 
biochemical parameters of the transmission process from the SNP to the CHD. 

We assume that the probability for the occurrence of CHD is described by a gaussian 
distribution with expectation value equal to the fixed point of the final component of the 
pathway fi 2p = x\, and standard deviation a 2p 



P(x 2 ) = -7= exp 

V2VT(T2 P 



(x 2 - /i 2p ) 2 
2< 



(31) 



In the probability description, the quantities [i 2p and a 2p are properties of the prior knowl- 
edge of the distribution of the occurrence of CHD derived from population sampling and 
expressed in terms of the biochemical parameters of the system according to the model 
considered. In the absence of further data, these quantities characterize a theoretical prior 
which we can assume to be approximated by a binomial distribution for sufficiently large 
Nchd- The parameter of the binomial distribution is the frequency of occurrence of CHD 
which has for maximal likelihood estimator Pchd = N C h d I (N C h d + N CHD ). The mean and 
the variance of the approximated gaussian distribution are given by 

A*2 P = N chd Pchd, <?l P = N CH d Pchd(^ - Pchd) (32) 

which yield respectively fi 2p = 170 ± 1 and o 2p = 9.2 ± 1.6. The fixed point corresponding 
to the canonical pathway denotes absence of disease, whereas deviations from this value 
will entail a non-vanishing probability of occurrence of CHD. In order to quantify this 
probability, we need to devise a criterion to determine the emergence of CHD. Deviations 
on the fixed point of an altered pathway from that of the canonical pathway are quantified 
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by A 2 



X 



* 2 



This quantity would measure a non-environmental marker 3 for the 



occurrence of CHD [22] . For deviations larger than a threshold value A 
occur. The probability of occurrence of CHD will be 



thres ^ e disease will 



P{CHD) = P{x 2 < /i 2p - A 



thres \ 



1 



2txo 2 , 



fir 



A thres 
_ii 2 



dx 



2 exp 



{x 2 - 

2a 2 



(33) 



Deviations at the level of x 2 will be the result of the propagation along the pathway of devi- 



ations at the level of xq according to Eqn. (18) or (26), depending on the model considered. 



The likelihood of the data on xq given the variable x 2 is assumed also to follow a Gaussian 
distribution centred at x and with standard deviation ct 0d 



P(x \xV, 



2lXO { 



exp 



[x 



- x*^ 



2) 



2< 



(34) 



Here the quantities x and <t 0d describe properties of the data in the presence of the SNP 
expressed in terms of the biochemical parameters of the system. The likelihood of CHD 
will be given by the integral in x 2 because the SNP enters as data through the modelling 
of the system. Substituting xo = xo(xl) = (A 2 x 2 — A)/A , where the A's are functions of 
the statistical parameters fi 2p , x and a 0o as well as of the biochemical parameters, we find 
that 



P(SNP\CHD) 



1 



i,~ A thres 



27T<7 0l3 j-oo 

1 A 

2 2(A - A 2 ) 



erf 



dx 2 exp 
A + (/i 2p 



2< 
Ai hres )(A - A 2 ) 



(35) 



where erf stands for the error function given by the integral erf (x) = (2/y/n) J* dy exp[— y 2 ]. 



This probability was computed in Eqn. (11). 



We can now derive the functional form of the evidence in terms of the statistical and the 
biochemical parameters. Combining the two assumptions above, we find that 



1 



{Xo\X 2 )l J {X 2 



2nao D a 2l 



exp 



exp 



{Xn 



2 <f 



(36) 



2 In accordance with the description encapsulated in the model, the SNP will act by causing deficiency 
in the modus operandi of the system. Should it instead act by causing excess, then a symmetric interval 
about fi2 P would be the generalization to account for possible saturation and consequent screening effect. 
The changes to implement in all the subsequent results would be straightforward. 

3 The distinction between functional and environmental markers can be shady and will thus require care. 
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where 



x 



(J, 



On 



an 



Od 



a 



eff 



an 



/4 



(To 



+ 



Xn 



On 



(37) 
(38) 
(39) 



Substituting x = xo(x 2 ), we integrate in x 2 finding that 

-+oo 



P{SNP) 
1 



dx* 2 P(xq\x 2 ) P(x% 



exp 



1 [A/A + fi 2p (A - A 2 )/A ] 

2 < + al(A - A 2 f/Al 



(40) 



If we furthermore assume that xq follows a Gaussian distribution centred at the value for 
the canonical path, which we denote by //o, and with standard deviation &o D , which is such 
that A Q hres = J2a^ D ln[l/P(x )] when xq = fiQ — Aq^ 65 , then following a similar reasoning 
to that for CHD, we will have presence of the SNP for xq < fi — A hres . Hence 



P{SNP) 



,. \thre 



dxo exp 



[Xq - Ho) 

2al 



1 

2 



erf 



A thres 
^0 

V2a 0n 



(41) 



which equals A e // = 0.30 ± 0.001, as computed in Eqn. (|9]), and thus serves to constrain the 
parameters in Eqn. (40). We can also solve for A hres finding that A hres = (0.52±0.006)<7o D . 
The quantity A^^ determines the parameter (3 in Eqn. (17). Combining the two conditions 
above, we find the value for P(x ) = 0.87 ± 0.003 which we can interpret as the probability 
that the SNP has occurred when xq is below the threshold value that can trigger the canonical 
pathway. 

Moreover, having in Eqn. (12) also computed P(t2DM\CHD), we write the correspond- 
ing likelihood 



1 



2hs 2d 



exp 



(y*2 



2s 



(42) 



Note that we cannot follow a reasoning analogous to that for the case of the probability of 
CHD because the population N t2DM is not entirely random. Substituting y 2 = y 2 (x 2 ) = 
x 2 /B 2 , where B 2 is a function of the biochemical parameters, we find that 



P(t2DM\CHD) 



1 



2ns 2r 



+ 



., A thres 



dx* 2 exp 



0C n 



2si 



B-> 



2 2(1 -B 2 ) 



erf 



(/i 2p -Af^)(l-£? 2 
V2s 2d B 2 



(43) 
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We can now compute the posterior probability of the variable x 2 given xq, i.e. 



P{x* 2 \x ) 



1 



1 



cxp 



-|(r-*4,/) 



cxp 



[x 2 - fl effj 



2 °eff 



2na 0D a 2p P(x ) 

finding for the probability that CHD will occur given that SPN has occurred that 

1 



(44) 



P(CHD\SNP) 



P(SNP) 



dx* 2 P(x \x* 2 ) P{x* 2 ) 



1 1 



erf 



<Af e * + al p [A + (^ 2p - A 2 )(A - A 2 )](A - A 2 )/Al 
V2a 0o a 2p ^ 'a 2 Qp + a 2 2p (A - A 2 f / A\ 



(45) 



The variables in this formal expression are constrained by the relations found above and 
which we summarize below: 



.001 



P(SNP) = f 00 {x ,a 0D ;a^,p) = 0.30 ± 0. 
P(SNP\CHD) ee ^(xo^o^A^;^,/?) = 0.20 ± 0. 
P{t2DM\CHD) ee / 22 (s^Af^c^,/?) = 0.09 ± 0.001 . 



001 



(46) 
(47) 
(48) 



Here the subscript D indicates properties of the data as derived form the model and con- 
strained by these particular data, and the subscript P indicates properties of the prior which 
are based on the knowledge inferred from data sets delivered by other experiments. These 
functions depend on our knowledge of the rates in the model of the implicated pathways 
as well as on the statistical properties of the associated risk factor. However, from these 
three relations as constrained by the data, we can solve for three parameters only. Solving 
for the remaining parameters requires additional conditions for the statistical properties of 
the priors and biochemical parameters. Nonetheless, the idea that the present study serves 
to introduce and which we here applied to one data set on one risk factor has been demon- 
strated, i.e. a) how to extract the statistical properties of the event from the corresponding 
phenomenological data and then b) from the statistical properties of the event how to extract 
biochemical information on the causal relations that link the event with the risk factor. 



IV. DISCUSSION 

In this manuscript we derive the probability of occurrence of CHD based on data on the 
presence of the SNP at the -308 position of the TNF-a gene. We first worked following a 
bottom-up approach. Comparing different hypotheses for the statistical relation between the 
occurrences of SNP and CHD, we selected the working hypothesis on the basis of the Bayes 
factors. We showed that the data favour the association of the SNP with the occurrence of 
CHD as well as the participation of the occurrence of t2DM in the causal relation. Using the 
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Bayes theorem, we computed the probability of the SNP conditional on the occurrence of 
CHD. We then worked following a top-down approach. We presented a schematic model for 
a simplistic description of the signalling pathway which relates the presence of the SNP with 
the emergence of CHD. The data contain information on equilibrium states of the several 
variables that describe the biochemical system and can thus be translated into a probability 
description. We then computed the probability of CHD given that the SNP had occurred, 
using for the likelihood the probability previously computed. We expressed the result as a 
function of both the biochemical parameters of the model and the statistical parameters of 
the prior probability distributions. Other probabilities were also computed, which serve as 
constraints to the parameters. 

In an upcoming study we will be exploring the idea further by integrating the sparse 
existing data on various population samplings. We will select the data for CHD given 
different risk factors, and for the SNP given different diseases. From the first selection we 
intend to extract the remaining statistical parameters, since the prior of CHD will be shared. 
Also a link should be established between this formalism and the CHD prediction estimates 
from a multivariable risk calculation [24]. From the second selection we intend to extract 
the biochemical parameters of the signalling pathway. Although the prior of CHD will be 
shared, the biochemical system will grow in complexity and new rates will be introduced. 
We expect, however, that by exhausting the data sets available we will reach a balance of 
unknowns and equations that would allow us to solve the problem. Should this balance 
not be attained, we will resort to simulating data and thus fitting the unknown parameters 
P~U EE]. Whenever available, we will complement the study with temporal information to 
obtain reaction rates Ultimately we expect to be able to infer a universal law for gene 
mutation by systematising the various diseases into a comprehensive model of the signalling 
pathway. 
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Appendix A: Computation of the evidence 

In this Appendix we compute the evidence for the six hypotheses discussed in Section |TT} 
Hypothesis H 00 has only one free parameter, the probability that the SNP occurred. This 
probability, describing a mutation process, is assumed to have a Poisson distribution charac- 
terized by the size of the population N and a mutation rate A. The probability of n mutations 
is 

P(n\X A N) = e xp[-XN]^^- (Al) 

97. 
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with the mean number (n) = XN. For n = Nsnp mutations in a sample of size N and a 
uniform prior distribution for A, P (A) = 1, we find that 

poo 

P(D SNP \H 00 ) = / dXP(D SNP \XAN)P(X\H 00 ) 
Jo 

poo (\N'\ NsNP 1 

= / dX exp[-AA ^ > , =-. (A2) 

Jo 1*SNP- ^ 

Hypothesis H 01 has two parameters, the probabilities that the SNP occurred given the two 
values of the variable CHD. There are two possible sources of SNP, namely the population 
with CHD and the population without CHD. The presence of the SNP follows a binomial 
distribution where p 01 is the frequency of the SNP for the case of CHD and p i is the 
frequency of the SNP for the case of CHD. It follows that 

P(D S Np\H i) = J dp 01 J dp i P (Dsnp\poi A p i A H m ) P (p i A p i\H 01 ) 
= J dp i P(D S np\poi A H i)P(p 01 1 Hqi) 

+ J dpoi P (Dsnp\poi A Hqi) P (Poi\Hoi) ■ (A3) 



Moreover, assuming a uniform prior distribution probability for the frequencies p i an d Pol 
of the data on the SNP given respectively the occurrence or non-occurrence of CHD 



P(poi\H 01 ) = l, P(p 01 \H 01 ) = l, (A4) 



we find that 



P(D SNP \H 01 ) = Jdp 01 ^ NcHD )p \ SNP > CHD (l-p ir^«° 



SNP,CHD 



i dpoi (J cH ^p N or p ' cHD {-t-p»i) Nmmm 

\ ly SNP,CHD/ 
N C HD \ N SNP,CHD^-N m p CHD \ 



+ 



NsNP,CHD J {NcHD + 1)! 

N CHD \ ^SNP,CHD^'SNP,CHD^ 



Nsnp,chdJ 

(A5) 



Nchd + 1 N UWS + 1 

Similarly to Hqi, hypothesis Hiq has two parameters, the probabilities that the SNP 
occurred given the two values of the variable t2DM. The evidence is given by the same 
expression as that of hypothesis H i with the variable CHD replaced by the variable tIDM 
and under the analogous assumptions on the corresponding frequency priors pio and pi . 

Hypothesis Hn has four parameters, one for each state of the variables CHD and tIDM. 
This hypothesis combines the two hypotheses previously discussed which are assumed com- 
plementary, thus being a case of a two-component hypothesis with probabilities (3 and (1 — j3) 
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191. It follows that 



P(D 



snp\Hu 



dp 01 / dp 



no 



f3P (D SNP \p 01 A H X1 ) P {p 01 \H xx ) 
+ (1 - /3)P (D SNP \ Pl0 A H n ) P (p 1Q \H n ) 
+ J dpoi J dp- 10 0P(D SNP \p ox AH xx )P(p ox \H xx ) 
+ (1 - (3)P (D SNP \p X0 A H n ) P (Pio\Hn] 



(A6) 



which yields 

P{D SNP \H XX ) 



NcHD 
NsNP,CHD 



NsNP,CHD ! N~SNP,CH D ' 



(N CHD + 1)\ 



+(1-/3) 



N 



t2DM 



N 



SNP,t2DM 



SNP,t2DM ! N SNP,t2DM 



(N 



t2DM 







N 



CHD 



SNP,CHD 

1 



^SNP,CHD^SNP,CHD : 



1)! 



(iV, 



CHD 



+ 1 ! 



A'; 



t2DM 



N 



AN 



SNP,t2DM ■ 1 v SNP,t2DM • 



N, 



SNP.t2DM 



J 



t2DM 



+ D! 



A^otd + 1 

+ * 1 



^t2DM + -L 
7 + (l-£) ' 



iV 5 



t2DM 



+ 1 



(A7) 



Here /3 = N C hd/(N C hd + N^dm) is the probability that the data were extracted from 



the pool of hypothesis ifoi and /3 = N CHD / (2N- t 



CHD.t2DM 



+ N 



CHD12DM 



+ N 



CHD,t2DM 



4) the 



probability that the pool is that of the complement of Hq X . Similarly we define (1 — 0) and 
(1 — 0) from hypothesis H w . 



For Hypothesis H^ d we have four parameters for combined states of the variables CHD 
and t2DM, namely CHD A tlDM and CHD A t2DM as the states which are conditional 
to non-occurrence of t2DM, and t2DM A CHD and t2DM A CHD as the states which are 
conditioned to occurrence of CHD. The corresponding four frequencies are as follows: poi 
is the frequency of SNP given the occurrence of CHD and p ox the frequency of SNP given 
non-occurrence of CHD, both subject to non-occurrence of t2DM; p xx is the frequency of 
SNP given that t2DM has occurred and p xx the frequency of SNP given that t2DM has not 
occurred, both subject to CHD having occurred. For a uniform prior probability of these 
frequencies, we find that 



P(D SNP \H$ d ) 



j dp 01 J dp u ^fP{D SNP \p 0X AH^ d )P{p 0X \Hll 
+ (1 - ^)P(D S np\pii A H$ d )P(p xx \H$ d ) 



+ / dp i / dp 



'11 



^(ZWlPoiA^V^oiltfn 



rchd\ 



rchd\ 



+ (1 - i)P(D SNP \p ll A H<™)P{p xx \H<* 



(A8) 
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which yields 



D/n \uchd\ ( N CHD,t2DM \ N SNP,CHD,t2DM ] - N SNP,CHD,t2DM 

P{D S Np\H n ) = 7 1 I ^- 

\ iv SNP,CHD,t2DM/ \ ly CHD,t2DM + JJ 1 



+ (1-7) 

+7 



N C HD,t2DM \ N SNP,CHD,t2DM^NsNP )CHDtt2D M- 



NsNP,CHD,t2DM J (AT CHD,t2DM + 1)' 
N CHD t2 DM \ ^SNP,CHD,t2DM ^SNP,CHD,t2DM^ 



NsNP,CHD,t2DM/ {^CHD,t2DM + l) ' 



^CH D t2DM \ ^SNP,CHD,t2DM ' ^ SN P,CHD,t2DM ' 



X \N SNPCHDt2DM ) {N CHDt2DM + l)\ 

1 1 

= 7 1- (1 - 7) 

N CHD,t2DM + 1 N C HD,t2DM + 1 

+ ^_-_ + l +( 1 -^ iV — + 1 - ^ 
CHD,t2DM ^ 1 CHD,t2DM ' 1 

Here 7 = N CHDmm /N CHD and 7 = N UBStmB , , N muM . 

Similarly to H^ d \ hypothesis ^/"j* 2dm ) has four parameters for combined states of the 
variables CHD and t2DM, namely t2DM A C/TD and t2DM A CDDD as the states con- 
ditional on non-occurrence of CHD, and CHD A t2DM and CHD A t2DM as the states 
conditional on occurrence of t2DM. The frequencies are analogously defined to those of hy- 
pothesis H^ d . The evidence is given by the same expression as that of hypothesis H^ 1 ^ 
with p i and p i replaced by the frequency of SNP, subject to non-occurrence of CHD, given 
the occurrence or non-occurrence of t2DM respectively pxo and pi , and pii replaced by pn. 
Analogously 7 is replaced by Noh^^dm/^dm and 7 by N mm j 2 - nM /N UJW . 



Appendix B: Calculation of P(SNP\H 00 ) 

In this Appendix we compute for the purpose of comparison the probability of occurrence 



of SNP for hypothesis Hqq. Starting from Eqn. (Al) and using the Bayes theorem, we find 
for the posterior probability of A that 

fWflM ,A^) = P '^^M , (B1) 

r {L>snp\-IV ) 



The normalizing constant is the evidence computed in Eqn. (A2). Given the data and for 



P(SNP\X A H 00 ) = A, the probability of a mutation in a population of size N is 
P(SNP\H 00 ) = [ d\ P(SNP\XAH 00 )P(\\D SNP AH { 



00 J 

^XNeMW-^-^ (B2) 

1\SNP ] - ^ 

which for N SNP = 184 and N = 683 yields P {SNP A H 00 ) = 0.27. Despite the small 
difference between the values derived from the two hypotheses (which might be considered 
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insignificant given the observational errors which are of order 1/y/N ~ 0.038), the fact that 
there is a difference highlights the relevance of hypothesis testing before committing to a 
probability which will act as likelihood in subsequent calculations. 



Vassali P, The pathophysiology of tumor necrosis factors, Annu. Rev. Immunol 10 (1992) 411 
Vendrell J, Fernandez-Real J-M, Gutierrez C, Zamora A, Simon I, Bardaji A, Ricart W and 
Richart C, A polymorphism in the promoter of the tumor necrosis factor-o gene (-308) is 
associated with coronary disease in type 2 diabetic patients, Atherosclerosis 167 (2003) 257 
Elahi MM, Gilmour A, Matata BM and Mastana SS, A variant of position -308 of the Tumor 
necrosis factor alpha gene promoter and the risk of coronary heart disease, Heart Lung Circ. 
17(1) (2008) 14 

Dedoussis GV, Panagiotakos DB, Vidra NV, Louizou E, Chrysohoou C, Germanos A, Man- 
tas Y, Tokmakidis S, Pitsavos C and Stefanadis C, Association between TNF-alpha -308G>A 
polymorphism and the development of acute coronary syndromes in Greek subjects: the 
CARDIO2000-GENE Study, Genet Med. 7(6) (2005) 411 

Vourvouhaki E and Dedoussis GV, Cholesterol ester transfer protein: a therapeutic target in 
atherosclerosis? Expert Opin. Ther. Targets 12 (2008) 937 

Cancello R, Tounian A, Poitou CH and Clement K, Adiposity signals, genetic and body weight 
regulation in humans, Diabetes Metab. 30 (2004) 215 

Hoffstedt J, Eriksson P, Hellstrom L, Rossner S, Ryden M and Arner P, Excessive fat accu- 
mulation is associated with the TNF alpha-308 G/A promoter polymorphism in women but 
not in men, Diabetologia 43 (2000) 117 

Walston J, Seibert M, Yen CJ, Cheskin LJ and Andersen RE, Tumor necrosis factor-alpha-238 
and -308 polymorphisms do not associate with traits related to obesity and insulin resistance, 
Diabetes 48 (1999) 2096 

Stephens M and Balding DJ, Bayesian statistical methods for genetic association studies, 
Nature Reviews 10 (2009) 681 

Brewer BJ and Lewis GF, Strong Gravitational Lens Inversion: A Bayesian Approach, Astro- 



phys. J. 637 (2006) 608-619 ( |arXiv:astro-p h/0509863vl) 

Trotta T, Applications of Bayesian model selection to cosmological parameters, Mon. Not. 



Roy. Astron. Soc. 378 (2007) 72 (arXiv:astro-ph/0504022v3) 



Bridges M, Feroz F, Hobson MP and Lasenby AN, Bayesian optimal reconstruction of the 



primordial power spectrum (arXiv:0812.3541vl [astro-ph]) 

Kiam S, Imoto S and Miyano S, Dynamic Bayesian network and nonparametric regression for 
non-linear modeling of gene network from time series gene expression data, BioSystems 75 
(2004) 57 

Vyshemirsky V and Girolami MA, Bayesian Ranking of Biochemical System Models, Bioin- 



21 



formatics 24(6) (2008) 833 
[15] Toni T, Welch D, Strelkowa N, Ipsen A and Stumpf MPH, Approximate Bayesian computation 
scheme for parameter inference and model selection in dynamical systems, J. Royal Society 



Interface, 6, 31 (2009) 187 flarXiv:0901 J 925vl [stat.CO]); Toni T and Stumpf PH, Parameter 
inference and model selection in signaling pathway models (arXiv:0904.4468vl [q-bio.QM]) 
[16] MacKay DJC, Information Theory, Inference, and Learning Algorithms, Cambridge University 
Press, 2003 

[17] Kass RE, Raftery AE, Bayes Factors, J. American Statistical Association 90, 430 (1995) 773 

[18] Frank AF, The Common Patterns of Nature (arXiv:0906:3597vl [q-bio.QM]) 

[19] Guglielmetti F, Fischer R and Dose V, Background-source separation in astronomical images 



with Bayesian probability theory (I): the method (arXiv:0903.2342 [astro-ph.IM]) 
[20] Komarova NL, Zou X, Nie Q and Bardwell L, A theoretical framework for specificity in cell 
signalling, Mol. Systems Biology 4100031 (2005); Bardwell L, X. Zou, Nie Q and Komarova NL, 
Mathematical Models of Specificity in Cell Signaling, Biophys. J. 92 (2007) 3425 
[21] Wodarz D and Komarova NL, Computational Biology of Cancer, Lecture Notes and Mathe- 
matical Modeling, World Scientific, 2005 
[22] Robinson SD, Dawson P, Ludlam CA, Boon NA and Newby DE, Vascular and fibrinolytic ef- 
fects of intra-arterial tumour necrosis factor-a in patients with coronary heart disease, Clinical 
Science 110 (2006) 353 
[23] Lupton R, Statistics in Theory and Practice, Princeton University Press, 1993 
[24] Wilson PWF, D'Agostino RB, Levy D, Belanger AM, Silbershatz H and Kannel WB, Predic- 
tion of Coronary Heart Disease Using Risk Factor Categories, Circulation 97 (1998) 1837 
[25] Vourvouhaki E, Carvalho C and Aguiar P, Model for Osteosarcoma-9 as a Potent Factor 
in Cell Survival and Resistance to Apoptosis, Phys.Rev. E76 (2007) 011926 (arXiv:0608030 
[q-bio.SC]) 



22 



